Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Integration Test Framework] fix createTempDir and flaky tests #5409

Merged
merged 5 commits into from
Sep 5, 2024

Conversation

belimawr
Copy link
Contributor

@belimawr belimawr commented Sep 3, 2024

What does this PR do?

createTempDir register a test cleanup function to remove the folder it created, however, on Windows, this folder sometimes fails to be removed because there are still open file handlers for the files within the folder.

We fix this problem by retrying to remove the folder with a maximum overall wait time of 3 seconds. This is a very similar approach to what Go's t.TempDir does.

Fix the flakiness from TestUpgradeHandler* tests by re-working the
mockUpgradeManager, now it accepts a function for its Upgrade method
and their implementation is goroutine safe

Why is it important?

It fixes flakiness on Windows hosts

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • [ ] I have made corresponding changes to the documentation
  • [ ] I have made corresponding change to the default configuration files
  • [ ] I have added tests that prove my fix is effective or that my feature works
  • [ ] I have added an entry in ./changelog/fragments using the changelog tool
  • [ ] I have added an integration test or an E2E test

Disruptive User Impact

There is not user impact

How to test this PR locally

Run the integration test on Windows:

SNAPSHOT=true TEST_PLATFORMS="windows/amd64" mage -v integration:single TestEventLogFile

Related issues

Questions to ask yourself

  • How are we going to support this in production?
  • How are we going to measure its adoption?
  • How are we going to debug this?
  • What are the metrics I should take care of?
  • ...

@belimawr belimawr added Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team skip-changelog backport-8.15 Automated backport to the 8.15 branch with mergify labels Sep 3, 2024
@belimawr belimawr requested a review from a team as a code owner September 3, 2024 15:53
@elasticmachine
Copy link
Contributor

Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane)

Copy link
Contributor

@blakerouse blakerouse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make lint happy, otherwise looks good.

@belimawr
Copy link
Contributor Author

belimawr commented Sep 4, 2024

Folks, I fixes some flaky tests (see c187045) that were also affecting this PR.

Could you review those changes as well?

@belimawr belimawr changed the title [Integration Test Framework] fix createTempDir [Integration Test Framework] fix createTempDir and flaky tests Sep 4, 2024
@belimawr belimawr removed the skip-ci label Sep 5, 2024
createTempDir register a test cleanup function to remove the folder it
created, however, on Windows, this folder sometimes fails to be
removed because there are still open file handlers for the files
within the folder.

We fix this problem by retrying to remove the folder with a maximum
overall wait time of 3 seconds. This is a very similar approach to
what Go's t.TempDir does.
Remove the custom removeAll function in favour of the
install.RemovePath that already exists.
Fix the flakiness from TestUpgradeHandler* tests by re-working the
mockUpgradeManager, now it accepts a function for its Upgrade method
and their implementation is goroutine safe
@belimawr belimawr force-pushed the fix-TestEvenLogFile branch from 72924bc to 9d98ac8 Compare September 5, 2024 12:43
Copy link

@belimawr belimawr merged commit 1242e71 into elastic:main Sep 5, 2024
13 checks passed
@belimawr belimawr deleted the fix-TestEvenLogFile branch September 5, 2024 15:50
mergify bot pushed a commit that referenced this pull request Sep 5, 2024
createTempDir register a test cleanup function to remove the folder it
created, however, on Windows, this folder sometimes fails to be
removed because there are still open file handlers for the files
within the folder.

We fix this problem calling install.RemovePath that will retry removing
the folder for about 2s. This is a very similar approach to
what Go's t.TempDir does.

Fix the flakiness from TestUpgradeHandler* tests by re-working the
mockUpgradeManager, now it accepts a function for its Upgrade method
and their implementation is goroutine safe

(cherry picked from commit 1242e71)

# Conflicts:
#	internal/pkg/agent/application/actions/handlers/handler_action_upgrade_test.go
#	pkg/testing/fixture.go
#	testing/integration/event_logging_test.go
belimawr added a commit that referenced this pull request Sep 5, 2024
createTempDir register a test cleanup function to remove the folder it
created, however, on Windows, this folder sometimes fails to be
removed because there are still open file handlers for the files
within the folder.

We fix this problem calling install.RemovePath that will retry removing
the folder for about 2s. This is a very similar approach to
what Go's t.TempDir does.

Fix the flakiness from TestUpgradeHandler* tests by re-working the
mockUpgradeManager, now it accepts a function for its Upgrade method
and their implementation is goroutine safe

(cherry picked from commit 1242e71)
belimawr pushed a commit that referenced this pull request Sep 6, 2024
TestLogIngestionFleetManaged was failing because the namespace generated by the integration tests framework was not unique among different tests and test runs, so sometimes collisions would occurs causing some tests to be flaky.

TestDebLogIngestFleetManaged was failing because it also has got Beats logging connection errors before receiving the configuration from Elastic-Agent, now this message is also in the allow list.

When testing .deb the AGENT_KEEP_INSTALLED environment variable is respected.

When an integration test fails, the work directory created by the framework is now kept and its path is printed.

createTempDir register a test cleanup function to remove the folder it created, however, on Windows, this folder sometimes fails to be removed because there are still open file handlers for the files within the folder.

We fix this problem by retrying to remove the folder with a maximum overall wait time of 3 seconds. This is a very similar approach to what Go's t.TempDir does.

Fix the flakiness from TestUpgradeHandler* tests by re-working the
mockUpgradeManager, now it accepts a function for its Upgrade method
and their implementation is goroutine safe

TestEnvWithDefault
Now TestEnvWithDefault unsets all environment variables it sets,
allowing it to be run multiple times using -count.

TestContainerCMDEventToStderr
TestContainerCMDEventToStderr did not call agentFixture.Prepare early
enough leading to an empty STATE_PATH env var, so all state
information was in /usr/share/elastic-agent, which could make
subsequent tests to fail because they could read
/usr/share/elastic-agent/state/container-paths.yml and use a state
path different than the one set in the test.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-8.15 Automated backport to the 8.15 branch with mergify skip-changelog Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Flaky Test]: TestEventLogFile failed on windows
4 participants